Use of Lexical Statistics for Compound Word Recognition and Segmentation in Turkish

نویسنده

  • Özkan Kiliç
چکیده

Compound words are cross-linguistic morphological phenomena that occur in all languages. Compound words are widely accepted to be stored in the lexicon but their constituents need to be accessed during both language learning and production processes. In this study, the use of corpora was investigated for how to differentiate single-stem words from single-word compounds and then how to segment compound words when no phonological information is available. Stems and morphs discovered in manual segmentations of the METU-Sabancı Turkish Treebank and the CHILDES were employed in the compound word recognition task and the results were compared. The METU Turkish Corpus (with about 2 million words) and a webcorpus (with about 490 million of Turkish words) were utilized in the segmentation task. The results emphasize that the lexicon can be morpheme-based; and lexical frequencies are effective heuristics in compound word recognition and segmentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

language development and lexical awareness of bilingual (Azeri -Persian) hard of hearing impaired children

The Relationship between Mean Length of utterance (MLU), Lexical Richness and syntactical and lexical metalinguistic Awareness in Bilingual (Turkish-Persian) normal and hearing impaired Children   Objectives: Regarding the impact of hearing loss on language development and metalinguistic skill and being language development different from metalinguistic skill in bilingual children, studying of...

متن کامل

Word-Forming Process in Azeri Turkish Language

The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...

متن کامل

Written word recognition by the elementary and advanced level Persian-English bilinguals

According  to  a  basic  prediction  made  by  the  Revised  Hierarchical  Model  (RHM),  at  early  stages  of language  acquisition,  strong  L2-L1  lexical  links  are  formed.  RHM  predicts  that  these  links  weaken with  increasing  proficiency,  although  they  do  not  disappear  even  at  higher  levels  of  language development. To test this prediction, two groups of highly proficie...

متن کامل

Turkish Resources for Visual Word Recognition

We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the w...

متن کامل

Lexical stress in continuous speech recognition

Human listeners use lexical stress for word segmentation and disambiguation. We look into using lexical stress for largevocabulary speech recognition for the Dutch language. It appears that beside vowels, consonants should be taken into account. By introducing stressed phonemes, and features for spectral bands and the fundamental frequency, we reduce the word error rate by 2.6 %.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015